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Abstract 

Background: Object detection in 3-D medical images is often necessary for 
constraining a segmentation or registration task. It may be a task in its own right as 
well, when instances of a structure, e.g. the lymph nodes, are searched. Problems 
from occlusion, illumination and projection do not arise, making the problem simpler 
than object detection in photographies. However, objects of interest are often not 
well contrasted against the background. Influence from noise and other artifacts is 
much stronger and shape and appearance may vary substantially within a class. 

Methods: Deformable models capture the characteristic shape of an anatomic object 
and use constrained deformation for hypothesing object boundaries in image 
regions of low or non-existing contrast. Learning these constraints requires a large 
sample data base. We show that training may be replaced by readily available user 
knowledge defining a prototypical deformable part model. If structures have a strong 
part-relationship, or if they may be found based on spatially related guiding 
structures, or if the deformation is rather restricted, the supporting data information 
suffices for solving the detection task. We use a finite element model to represent 
anatomic variation by elastic deformation. Complex shape variation may be 
represented by a hierarchical model with simpler part variation. The hierarchy may 
be represented explicitly as a hierarchy of sub-shapes, or implicitly by a single 
integrated model. Data support and model deformation of the complete model can 
be represented by an energy term, serving as quality-of-fit function for object 
detection. 

Results: The model was applied to detection and segmentation tasks in various 
medical applications in 2- and 3-D scenes. It has been shown that model fitting and 
object detection can be carried out efficiently by a combination of a local and global 
search strategy using models that are parameterized for the different tasks. 

Conclusions: A part-based elastic model represents complex within-class object 
variation without training. The hierarchy of parts may specify relationship to 
neighboring anatomical objects in object detection or a part-decomposition of a 
complex anatomic structure. The intuitive way to incorporate domain knowledge has 
a high potential to serve as easily adaptable method to a wide range of different 
detection tasks in medical image analysis. 
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Background 

Motivation 

Although medical images provide insight into patient-specific human anatomy other- 
wise not accessible, their interpretation still requires extensive expert knowledge to fill 
the semantic gap between image data and depicted anatomy and/or function. Image 
characteristics of the object-of-interest may be known but are often not unique. 
Furthermore, noise, imaging or reconstruction artifacts alter the depicted content to a 
much greater extent than for pictures such as photos. Anatomical objects are not 
always contrasted well against each other and neighboring structures have similar 
appearance. However, most of the data is 3-D. Change of shape and appearance due to 
projection has not to be dealt with. 

Still, detection and segmentation is possible if a suitable model completes missing 
information and corrects measurement errors in the data. The premier effect of such 
deficient information would be an incorrect delineation of the object's boundary. 
Hence, shape is the most important supplementary model information. Context infor- 
mation about adjacent structures, as well as information about object and background 
appearance may be added for discriminating among similar objects. 

Developing a different method for each new detection or segmentation task is ineffi- 
cient. Thus, a major challenge is to find a parameterizable representation that can be 
adapted efficiently for finding different objects. 

Object shape can be represented by sampling points on its boundary. If such model 
is fitted to the data, boundary points of a model instance are registered with likely 
boundary locations in the image (e.g., high gradient strength locations). If artifacts or 
low contrast cause some model points not to have counter parts in the image, the 
model instance predicts the local course of the boundary there. Visible shape parts 
have to be sufficiently characteristic so that this prediction does not result in inaccep- 
table errors. 

Several problems have to be dealt with: 

- The model has to include within-class variation of the structure of interest, while 
inhibiting influences from between-class variation. 

- A weight for combining data and model needs to be set appropriately. 

- The search for object instances in the data has to deal with many local minima of 
the corresponding optimization criterion. 

Fusing shape model and image data requires a quality-of-fit (QoF) function where 
supporting information from the data are represented by a term E extema i, which is regu- 
larized by a term Ei nterna i, representing, in this case, deviation from the prototypical 
shape model: 

E = E{ n i erna \ + X ■ E ex temal (1) 

This expression is found in many formulations of segmentation and detection tasks. 
It may be interpreted as an energy balance where external influences from a potential 
field generated by the data are counteracted by internal forces that represent domain 
knowledge such as the shape of an object or just the smoothness of its boundary. The 
parameter A weighs the reliability of these two kinds of information. The equation can 
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also be interpreted as logarithm of the conditional probability given by a normal- 
distributed likelihood function based on E externa \ and a normal-distributed a priori prob- 
ability depending on Ei ntema \. Here, A represents the ratio of variances of the two 
distributions. 

Equation (1) may be optimized by gradient descent. It requires that the model 
instance is initialized sufficiently close to the optimum searched for. For global optimi- 
zation, stochastic search techniques, e.g., the method by [1] may be used, which draws 
randomly distributed initializations and selects the best fitting candidates based on (1). 

With this paper, we present an elastically deformable model as representation for 
shape and appearance. We argue that this model may be used efficiently for object 
detection as it does represent shape variation and expected appearance for an object 
class without requiring training (although training may improve performance). The 
main reason for this lies in the fact that model information is represented by integrat- 
ing simple deformable shapes in a part-based model that, depending on the task, either 
represents object parts or guiding structures. We show that this model can be applied 
to solve different detection and segmentation tasks in medical image analysis. 

The remainder of the paper is structured as follows. We first present previous work 
on models in object detection for medical images. We then present our elastic model, 
its application to object detection and its extension to a part-based model. We con- 
clude with a number of examples that show the different capabilities of applying the 
model to detection and segmentation tasks. 

Previous work 

Extracting an object-of-interest from the image contradicts the usual assumption for 
segmentation that segment characteristics are part of the data information. Extracting, 
e.g., a liver from CT images may require separating it from gall bladder, stomach, and 
feeding vessel. Depending on imaging modality and protocol the appearance of these 
structures may be very similar and insufficient for separation. 

Extraction may be simple for a human observer with necessary knowledge about 
shape, appearance and location of the organ. In a computer-guided solution, a detec- 
tion task has to be solved first, which supports the subsequent boundary delineation 
task. The two tasks may be solved in parallel or sequentially. In the latter case, the 
detection result constrains the subsequent segmentation. In the former case, a model 
instance is expected to deform into the object that is searched for. 

The information that is needed for detection and segmentation can be quite com- 
plex. Necessary domain knowledge may be introduced at several stages of the process 
and the "intelligence" of the solution lies in the construction of the process from its 
sub-tasks (e.g., the kidney detection and segmentation scheme of [2]). Alternatively, a 
model can be built that contains information of all aspects about the object necessary 
for identification in the data including possible dependencies between different types 
of information. Detection is then fitting a model to measured data (e.g., the application 
of a shape and appearance model for detecting the left ventricle of the heart in MRI 
[3]). We prefer the latter approach as it separates model description from the search 
for model instances and may be adapted easier to a new detection task. 

If detection precedes delineation, it localizes the object-of-interest. In this case, the 
result locally constrains boundary delineation if the object is not sufficiently contrasted 
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against adjacent structures. Localization may be interactive (e.g., seed points for region 
growing [4], closed active contours [5], or starting points for live wire delineation [6]). 
In this case, the detection result often includes little information with respect to object 
boundary, since interaction costs are usually kept to a minimum. Hence, interactive 
localization is most suitable if the boundary itself is well defined by the data and inter- 
action is used to find the object among different structures with similar appearance. 
Small artifacts such as noise can be accommodated by requiring smooth and closed 
boundaries. If substantial parts of the boundary cannot be derived from data informa- 
tion, they need to be added by interactive delineation of boundary parts. 

Localization by matched filter methods such as vesselness filters [7], blobness [8], 
template matching [9] and Hough transform [10] does not require interaction, except 
possibly for specification of a relevancy threshold for a successful detection. These 
methods are capable of finding multiple instances of an object. The filters predict the 
(average) object shape and appearance which may then be used as prior for data-driven 
segmentation, e.g., by graph cuts [11-13] or level sets [14,15]. The methods operate 
with few parameters which usually can be found easily. Of course, it restricts shape 
representation either to variable, but simple shapes (e.g., the vesselness filter) for which 
intra-class variation can be described by few parameters, or to a representation by a 
single average shape (e.g., the generalized Hough transform). 

Representing acceptable shape variation of complex object shape is more difficult. 
Unconstrained shape variation will cause the model to fit almost everywhere in the 
image while an overly constrained shape will not fit well to the data. Active Shape 
Models (ASM) and its many variants have found widespread use since their introduc- 
tion in [16]. Shape is represented by coordinates of a set of labeled points. If appear- 
ance variation shall be represented as well, it is given by intensity or texture variation 
at these points. Variation is trained from correspondingly labeled test data. Influence 
from rotation, translation and sometimes scaling has to be removed by Procrustes ana- 
lysis [17]. Variation is assumed to be normal-distributed and highly correlated. Hence, 
principal component analysis is used for de-correlation and information is reduced to 
a few eigenmodes (modes of variation). The model enables a potentially arbitrarily 
exact fitting of the missing boundary, as long as the visible part of the boundary suffi- 
ciently constrains the variation of the missing part. 

Training of such a point distribution model (PDM) can serve two purposes. It estimates 
shape (and appearance) variation of the class of objects to be delineated and it restricts 
shape variation. The success of PDMs for model-guided segmentation has been shown in 
numerous publications (e.g., [5,18,19]). Often only few training samples - considering the 
degrees of freedom of the untrained model - are needed for segmentation. The reason is 
probably twofold. Firstly, within-class shape variation has much fewer degrees of freedom 
than the model so that the true co-variance can be estimated from few samples. Secondly, 
boundary delineation does not require exact estimates of intra-class variation anyway as 
long as incorrectly estimated shape variation is corrected from boundary information in 
the data. 

Besides PDM, the Hough transform may be extended into a probabilistic shape 
model as well [20]. However, compared to a PDM, its versatility is restricted since 
appearance information is not part of the model and it is unclear how the trained 
information may be used for subsequent boundary delineation. 
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Since a shape model has not to be very accurate for boundary delineation, prototypical 
variation has also been used for shape-supported boundary delineation. If detection is 
not necessary (e.g. when tracking a heart shape in a 3d+time sequence), the necessary 
shape information are properties such as closedness, smoothness and similarity of the 
boundary between time frames. This has been used, e.g., in tracking the heart beat from 
CT using an active surface based on a finite element model (FEM) to restrict boundary 
variation between time steps [21]. This concept has been extended to an elastically 
deforming shape model (e.g., the mass-spring model [22] or an FEM [23]). Such model 
restricts organ variation to elastic deformation of a prototypical shape. This is particu- 
larly simple for a homogeneous, linear-elastic FEM since only few parameters regulate 
its stiffness. Defining such a prototype solves part of the optimization problem in detec- 
tion as well, since - similar to active contour models - rather basic image information 
can be embedded by forces acting on a model instance, locally attracting it to the object- 
of-interest. 

However, an elastic model provides just an approximation of intra-class shape varia- 
tion. To some extent, either the delineation goal or the detection goal has to be sacri- 
ficed if the shape of the object itself or its variability is complex. If stiffness 
overestimates true shape variation, the model instance may adapt to the object bound- 
ary given sufficiently reliable image information, but shape deformation may not be 
used for object detection. If the shape variation is overconstrained it may be used for 
object detection but will result in a poor segmentation of the object. 

Instead of training complex shape variation, some of this variation can be efficiently 
represented using part-based models. A shape is decomposed into simpler parts. Variation 
is then that of the parts and that of part-relationships. In other applications, part-based 
models have been successfully used for articulated object detection (e.g., detection of faces 
and people [24], or pedestrian recognition [25]). Although part- relationships may be learnt 
[26], an advantage is that the qualitative knowledge of a decomposition of an object into 
parts is often readily available (e.g. the decomposition of the spine into a sequence of 
vertebrae separated by disks). We successfully applied a combination of deformable mod- 
els with a part-based representation for several object detection and boundary delineation 
tasks in medical image analysis. 

With this paper we describe how to use our part-based deformable model for object 
detection and segmentation and illustrate the applicability to different tasks in medical 
image analysis. 

Methods 

As stated above, variation represented by deformable models often does not have to be 
exact. However, if it is to be specified by the user instead of being trained from sam- 
ples, parameterization of the model should be intuitive. Representing the object as an 
elastically deformable material meets this condition since most users have an under- 
standing of how elasticity influences shape variability. Elastic models have been used in 
medical image applications for quite some time (e.g., [27] for elastic registration), 
although not often to restrict shape variation for detection tasks. We use this concept 
to replace trained shape variation. It should be noted that we do not want to simulate 
deformation behavior of an object (e.g., the deformation of the heart ventricles over 
time). Elastic deformation is solely used to replace distribution estimates to restrict 
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shape variation among objects of the same class. Since describing complex shape varia- 
tion by few elasticity parameters may be too simple, we use deformable objects as 
components of a part-based model. Relationships between parts are modeled by an 
elastic superstructure. 

Object detection consists of a local and a global part. Localized delineation lets a 
model instance be attracted and deformed by image features. The global search is rea- 
lized by repeatedly generating random initializations of model instances and selecting 
the best fitting candidates. Fitting is defined according to equation (1) by a data term 
that is regularized by shape deformation. 

Finite Element Models (FEM) 

We use FEMs to represent the deformable shape. A finite element model represents 
linear-elastic deformation of a 2-D or 3-D object based on a discretization of the object 
domain into finite elements e. Elements are generated from node locations that sample 
the object domain. The number of nodes depends on the accuracy with which an 
object is described. For a given object, the FEM is generated from a 2-D or 3-D repre- 
sentation of the object's shape, either based on a priori knowledge, e.g., about the 
shape of a vertebra, or from an example segmentation. If segmented data is used, it 
should reflect a representative object example. It may also be helpful to smooth the 
segmentation in order to remove irrelevant detail from the model instance before gen- 
erating the FEM. The FEM may be generated by triangulating the segment, e.g., by 
computing a Delaunay triangulation. 

For a single element, external forces fte> at element nodes cause node displacement u ( e \ 
Their relation is governed by a stiffness matrix n( e > 

KM u «=fM. (2) 

The displacement depends on applied forces, the elasticity and geometry of the ele- 
ment, as well as on the interpolation functions used to compute continuous material 
deformation within the element. When linear interpolation is sufficient - which is true 
for our applications - element shape and elasticity parameters are the only influences 
that define the deformation behavior. 

The different elements e of the FEM are assembled according to the decomposition 
of the object domain (for details see [28]). It results in an aggregated stiffness matrix K 
which gives the following relation between forces f and displacement u via 

Ku = f. (3) 

Element- wise assemblage happens in two steps (see Figure 1). First, nodes of all ele- 
ments are re-labeled so that nodes of two different elements have the same label only 
if this node is common to the two elements. Then, expanded matrices K^j, and force 
vectors f^, are created for each element e having a size that depends on the total num- 
ber of nodes of the FEM. Locations in K^j, and f^j, that represent labels not occurring 
in e are filled with O's. Assemblage is then simply summing expanded matrices and 
vectors. 

The FEM approximates the mean shape and possible physics-based deformation of 
the object-of-interest. Shape variation among different exemplars from the object class 
is given by the FEM decomposition and few material parameters. Assuming that the 
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Figure 1 Assemblage of an FEM from elements. Elements are assembled by first re-labeling nodes, then 
extending the stiffness matrices accordingly (non-zero entries are indicated in blue) and then adding the 
matrices. 



model consists of isotropic, linear elastic material, the elasticity of the elements can be 
described by Poisson's ratio and the elastic modulus [29]. The Poisson's ratio p is the 
ratio between deformation in the direction of an incident force and deformation ortho- 
gonal to it (see Figure 2a). For most materials, 0 > p > -1, since compression in one 
direction causes expansion in the orthogonal direction. For object detection, it may be 
useful to define a positive Poisson ratio. This would cause contraction in all directions 
if forces in some direction cause compression. Scale invariance can be implemented 
(to a small extent) using this parameterization. 

The elastic modulus, given the isotropic material assumptions above, may be 
described by Young's modulus. It tells how an isotropic material resists deformation 
due to opposing forces (see Figure 2b). Changing Young's modulus controls the 
amount of deformation due to external forces, e.g., image-derived forces. 

Given element geometry, interpolation functions, Poisson's ratio and Young's modu- 
lus, the stiffness matrix can be computed for each element and assembled into global 
stiffness matrix K. Hence, the FEM, created from an exemplary segmentation or from 
a priori knowledge, represents a deformable shape by very few parameters that are 
intuitive in a sense that a user does not have to understand the underlying numerical 
concepts of the FEM method. 

Dynamic optimization and object detection 

Using the FEM for object detection requires localizing and deforming the model based 
on image-derived forces. The extension of the FEM to a dynamic model has the practi- 
cal advantage that parts of the localization tasks can be solved as a time-dependent 
attraction of a model instance to local image features. It requires extension of the 
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Figure 2 Factors that influence the elasticity of an FEM. (a) Poisson ratio p represents how 
displacement due to an incident force is transferred orthogonal to the direction of incidence. In (a) the 
Poisson ratio is negative, since a contraction in force direction causes an extension orthogonal to it. (b) 
Elasticity represents the amount of deformation caused by an incident force. 
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governing equation by a mass matrix M, which represents resistance to forces based on 
the current acceleration ii(f) at time t, and by a damping matrix D that represents 
transfer of kinetic energy dependent on current speed u (t): 



Specifying masses in M is necessary if a moving model instance should resist force 
changes from the image in order to let it move over spurious image details from noise 
or artifacts. If this is not necessary, mass may be omitted, leading to a damped gradient 
descent. 

For practical reasons, we use Raleigh damping, which defines D as linear combina- 
tion of M and K: 



If the system is mass-free, we may set D = I + J3K, where I is the identity matrix. 

External forces acting on FEM nodes may be defined separately and differently for 
every node. It is, however, practical to define node groups and let each node group be 
attracted by the same kind of image forces. Examples are the following: 

- Boundary nodes are attracted by boundary features in the image 

- Inner nodes are attracted by image features according to the specific appearance 
of the object to be detected 

Different kinds of boundary or appearance nodes can be defined if the expected edge 
strength or appearance varies in an object-specific fashion. 

External forces can be computed based on the current location and displacement of 
nodes with respect to image features in the model's vicinity. This may become neces- 
sary if the data is unreliable and current node locations and displacement are used to 
select image regions evaluated for force computation. An example will be demon- 
strated in Application 2 described below. 

If data quality permits, external forces can be pre-computed as gradient of a potential 
field derived from the image. Potential fields may be defined separately for each node 
group (see Figure 3). Nodes of the FEM are pulled to a minimum in the potential. In 
order to attract an FEM instance over some distance, the potential field is convolved with 
an influence function, i.e., a low pass filter that is decreasing with distance from the kernel 
center. Local features for boundary nodes are edges or edge features such as gradient 
length. Local features for appearance nodes depend on expected intensity or texture. The 
influence function is a Gaussian of which the variance determines the influence radius. 

It should be noted that, while boundary nodes relate visible object boundary in the 
image directly to model nodes, appearance nodes typically do not. Since expected 
appearance (e.g., brightness) may or may not be constant over the object it may not 
help to move and deform an FEM instance into its proper place. Appearance nodes 
are useful nonetheless: 



Mu (t) + Dii (t) + Ku (£) = f (t) • 



(4) 



D = giM + jSK. 



(5) 



- A single or several appearance nodes may help to determine whether the bound- 
ary nodes - which are only sampling the boundary - sit on the correct object 
boundary. 
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y potential 
t t 

Figure 3 Example of a potential field. The field is generated from a weighted combination of a T1- and 
a T2-weighted image for attracting nodes to vertebrae. It is clearly visible that the minima for the 
vertebrae are just local minima, since several internal organs produce a much more pronounced minimum. 
Detection will therefore require additional information from shape and configuration of vertebrae. 



- Appearance nodes that sample the appearance inside the object and in the back- 
ground in the object's vicinity may serve as a complex boundary model in cases 
where the boundary is difficult to detect in the data using a local operation such as 
edge detection. 





Since local features are not unique (otherwise, detection would be trivial), the influ- 
ence radius, i.e., kernel width should be as large as possible in order to attract FEM 
instances from far away, but small enough to avoid overlapping influences, i.e., blurring 
from different local features. 

Given particular potential fields, an FEM instance can be placed anywhere in the 
image and deforms under the field-derived forces. The potential field is constant over 
time and may be pre-computed. Forces f change with the displacement in the potential 
field and, therefore, with time. An instance has found its final destination when inter- 
nal forces from deformation and external forces from the image balance each other. 
Computing adaptation of the deforming FEM instance to the image requires computa- 
tion of the dynamic system described by equation (4). 

Computation of this system of dependent differential equations can be done in several 
ways (see, e.g., [28]). We prefer a computation via decorrelation into independent modes. 
It does not only produce a stable solution to the problem but also allows selecting relevant 
deformation modes (similar to the selection of variation modes in ASM). This has been 
used for object classification [30], mapping between similar objects [31], and to provide a 
base for training an ASM, if training is desired and possible [32]. 

For optimization, a generalized eigenproblem is solved for mass matrix M and stiff- 
ness matrix K 

KE = MEA with E T KE = A and E T ME = I, (6) 

where A is the diagonal matrix of eigenvalues, E is the column matrix of eigenvec- 
tors and I is the identity matrix. Mapping K and M onto E and assuming Raleigh 
damping according to equation (5) results in a new system 

M'c (t) + D'c (t) + K'c (t) = E T f (t) 

with M' = E T ME = I, D' = E T DE = diag {d u d n ), K' = E T KE = A (7) 
and Ec(£) = u(f), Ec(t) = u(t), Ec(t) = u(t). 
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The new diagonalized system contains simple differential equations of the type 



where e; is the 2-th eigenvector of E and A; is the r-th eigenvalue. An analytical solu- 
tion for such equations can be computed stably and fast. The approach is applicable 
for a mass-free system as well. 

Eigenvectors and eigenvalues carry semantics similar to ASM. An eigenvector is 
called a mode of vibration and represents a generalized symmetry axis of shape defor- 
mation. The overall deformation is a weighted sum of the generalized deformations 
due to the transformed forces E T f(t). It is possible to reduce the number of modes in 
order to remove small information details and reduce computational costs. Eigenvalues 
characterize the amount of force necessary to cause displacement. Assuming ascending 
order of eigenvalues, the first n\ eigenvalues of an M-dimensional FEM will be zero and 
represent rigid transformation (rotation and translation). The following, intermediate 
eigenvalues are the most relevant modes representing shape deformation. 

Rigid transformation is still part of the model, which is different to ASM where the 
trained model is normalized with respect to rotation and translation and where this 
information has to be incorporated into the fitting process by means of an additional 
registration step. This causes problems, however, since deformation constrained by 
the stiffness matrix K is defined as strain caused by a directed force on an infinitesimal 
line at this point. Since this line is defined in the given coordinate system, rotation of 
the FEM instance would make the stiffness matrix K dependent on the current rotation, 
hence dependent on time t. This is unwanted since it would require re-computation of 
K and of the vibration modes at every new iteration step. Neglecting it would cause 
some part of the rotation to be interpreted as deformation possibly leading to serious 
distortions (see Figure 4) [33]. This is only acceptable if we assume that local fitting of 
the model instance comprises only little rotation. Still, this would result in a loss of the 
benefit of including rotation into the optimization. Hence, we use warping techniques, 
presented, e.g., by [33,34] to circumvent the problem. 

The basic idea of warping is to include the current rotation into the representation. 
Therefore, forces are applied to the un-rotated model and the result is rotated back. If 
warping is applied on the node level, the current rotation R, is computed for all nodes 
i. They are concatenated into an orthogonal matrix R of which its inverse is applied to 
the current displacement u(t) as well as to its derivatives. Deformation is then applied 
to these vectors and the result is rotated back (the impact of this correction can be 
seen in Figure 4). It can be shown that this can be applied in the spectral basis as well 
[33]. This essentially means that the current vibration modes are "rotated" versions of 
the original modes. 

The current rotation can be computed by several methods. A fast way is to search 
for the rotation R, at node coordinates x, that optimally registers difference vectors 
d,y = x, - x, between x, and adjacent nodes x, relative to difference vectors d,y 0 at t = 0. 
The resulting optimization problem 



di(t)+d' i ii(t)+l i c i (t) = e i r »f t (t), 



(8) 




R(t) 



(9) 



for each node can be computed fast using a method proposed by [35]. 
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Figure 4 Modes of vibration of a 3-D stick model. The first n! modes of an n-d model (in this case the 
first 6 modes of the 3-D model) represent rotation and translation and are not shown. Only odd modes 
are shown since, in this case, deformation of mode 2k is the same than mode 2k-1, except that it deforms 
orthogonal to mode 2k-]. If warping is omitted (upper row), local rotation is resulting in unwanted 
deformation. 



Dynamic optimization of an FEM instance as described above produces a locally 
optimal fit. For global optimization we use a stochastic initialization technique similar 
to the method that was suggested by [1]. FEM instances are initialized at different loca- 
tions in the image and serve as detection candidates for the object-of-interest. After 
performing local optimization as described above the best candidates spawn new 
instances in their vicinity. The process ends when no further improvement can be 
reached. 

Rating candidates requires definition of a QoF function. It is defined according to 
equation (1). Data quality E exmna i measures the value of the potential field at final node 
locations and the regularization term Ei nterna i measures the deviation of the model from 
its initial shape based on the weights of the vibration modes. 

Computational complexity of the process is 0{N) where N is the number of nodes of 
the model instance. However, computation times depend on the convergence of the 
various iterative optimization schemes, namely the iterative deformation, the iterative 
computation of the singular value decomposition in [35] for computing the warping, 
and on the number of iterations in the stochastic search. 

Hierarchical part-based model 

The few parameters of the deformable model described in the previous section are suf- 
ficient for object detection as long as the object in question has a rather characteristic 
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mean shape and appearance. If this is not the case, training of shape variation such as 
in ASMs would help. However, to reduce training effort (ideally up to a point where 
all parameters are user-specified based on domain knowledge), information has to be 
included in a qualitative rather than a quantitative way. Hence, we adopted the princi- 
ple of part-based models for augmenting the descriptive power of our model. 
Representation by a part-based model serves two different purposes: 

- The parts may be sub-objects of the object-of-interest (such as the vertebrae of a 
spine model). 

- The parts may relate the object-of-interest to surrounding objects that help to 
localize the object. 

Additionally to the elastic deformation of the parts, division into parts, their relation- 
ship to each other, and potential mutual deformation are relevant for model 
specification. 

A part-based model can be realized conveniently within a hierarchical FEM (HFEM) 
framework [36-38]. The shape is decomposed into parts based on user's specification. 
Each part is represented by an FEM. This constitutes the morphological layer of the 
complex object. Parts may have different material properties and different potential 
fields to accommodate their different semantics. Spatial relationships between parts are 
represented by a second-level FEM which constitutes the structural layer. Forces on 
the structural layer FEM are exerted from deformation and displacement on the mor- 
phological layer of the HFEM. They, in turn, impose constraints on the shape of its 
structural layer. 

The HFEM may be realized explicitly by generating a set of morphological and struc- 
tural FEM or implicitly by generating a single, heterogeneous FEM from the morpholo- 
gical and the structural FEM. 

Explicit representation results in a set of independent morphological FEM that are 
coupled to the structural layer via virtual zero-length-springs that connect nodes of the 
two layers [36] (see Figure 5). Relationships between sub-shapes are specified by the 
kind of connection between the layers. For instance, the following relationships can be 
defined for a 2-D model using different numbers of nodes (see Figure 6): 

- A single-connected shape may rotate independently but is restricted in distance 
to other sub-shapes via the structural layer. 

- A sub-shape that is connected via two nodes to the structural layer is further 
restricted in its rotation. Properties such as approximate orthogonality or paralle- 
lism of parts can be ensured by this kind of connection. 

- A sub-shape connected by more than two nodes shares some of its non-rigid 
deformations via the structural layer. 

The influence of external forces between layers is then realized by a message passing 
algorithm [37]. Displacements due to deformation on the morphological layer act as 
external forces on the structural layer, while deformations on the structural layer cause 
forces on the morphological layer. Computation of the dynamic behavior, hence, 
requires solving the optimization problem presented in the previous section for each 
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Figure 5 Hierarchical FEM. The structural layer is connected to the morphological sub-shapes via virtual 
links. Each displacement from one layer is transferred as force on the other layer. 



sub-shape on the morphological layer, passing the resulting forces to the structural 
layer, computing the resulting deformation and passing displacement back to the mor- 
phological layer. 

In the hierarchical representation, each FEM is diagonalized separately resulting in its 
own vibration modes. It allows selecting vibration modes independently for each FEM. 
Hence, different requirements on precision, e.g., between structural and morphological 
layer, or for different substructures on the morphological layer can be accommodated. 

When computing the quality of fit, equation (1) has to be computed first for energies 
Ei of each subshape / on the morphological layer, resulting in a weighted sum of mor- 
phological fits 

Etotal = 2_, J=1 W i E j- ( 10 ) 

The weights Wj may be used to account for a different importance of parts. If, for 
instance, some parts are just guiding structures to find the actual object-of-interest, 
their fit has not to be very precise. 

Since the part relationship may be decisive for detecting the object, deformation and 
image support is then computed for the structural layer FEM as well. Deformation is 
measured in the same way as for the morphological layer. The external energy is now 
given by the state of the morphological sub-shapes. Hence, it is simply defined by the 
QoF from the sub-shapes of the layer below. 




Figure 6 Different kinds of connection between layers cause different kinds of restrictions (a) 

Single-connected nodes allow for independent rotation around the nodes, (b) Double-connected nodes 
can be used to enforce approximate orthogonality or parallelism (this example) between sub-shapes, (c) 
Connecting further nodes of the sub-shape with the structural layer enforces co-deformation, such as, e.g., 
curvedness of the sub-shape (left sub-shape in 6.c) or its size (right sub-shape in 6.c). 
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Alternatively to representing the part model as a hierarchy of homogeneous FEM it 
may also be represented as single heterogeneous FEM [38] (see Figure 7). Heterogene- 
ity refers to the fact that it will be a combination of the different part-FEMs, each of 
which possibly having different elasticity parameters. 

Morphological and structural layer FEM are defined as above. However, instead of 
treating them as single FEM, which communicate through message passing, the FEM 
are assembled in the same way as elements were assembled to the FEM. Assemblage is 
guided by node connections between structural and morphological layer as before. 
Through the assemblage these nodes are treated as shared nodes between elements of 
the morphological and the structural layer. 

Solving the part-based FEM optimization by assemblage in a single, heterogeneous 
FEM has different properties compared to the previous method: 

- Computation of vibration modes applies to the complete part-based object and 
optimization does not require message passing between deformable part models. 

- The quality-of-fit is computable in the same way than for a single FEM based on 
deformation and input from the data. 

- Different potential fields and/or different material properties of the part-FEM 
replace the weights in equation 10. 

- If warping is used on the node level this applies to the complete FEM. Hence, 
single-connected FEM do not allow independent rotation. 

In general, the second solution is simpler regarding computation of dynamic defor- 
mation and stochastic optimization, but it also integrates parts tighter into the repre- 
sentation. Therefore, it depends on the application which of the two methods should 
be preferred. This will be discussed further in the next section where we will present 
different applications using the two approaches for different analysis goals in medical 
image analysis. 

Results and discussion 

In the following, we present a number of applications to illustrate how to parameterize 
and use a part-based deformable model for various tasks in medical image analysis. 
We will summarize performance and results for each application. Since we use the 
application to demonstrate the use of the model, we refer to the original publications 
[38-40] for details. 




Figure 7 Integration of morphological and structural sub-models in a single FEM. Structure and 
morphological layer are connected and elements of the different FEM are assembled to a single FEM. 
Elasticity parameters of the morphological and structural FEM may be different. 
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Indentifying Heschl's gyrus in flat maps of the human cortex 

Experiments with functional magnetic resonance imaging (fMRI) attempt to localize 
and delineate particular brain regions, such as the human primary auditory cortex 
(pAC) and neighboring higher-order areas, in vivo. The pAC is known to be located 
on the first transverse temporal gyrus (i.e., Heschl's gyrus, HG). Since the region cov- 
ered by the pAC is very small with respect to the spatial resolution of fMRI and the 
signal to noise ratio is rather poor, fMRI data of a population of subjects are combined 
to arrive at a representative functional map. The combination of individual data to a 
group map requires the mapping of corresponding regions across subjects. We solved 
this registration task by localizing macro-anatomical landmarks (i.e., the lateral "Syl- 
vian" fissure and superior temporal sulcus) that delineate the superior temporal lobe 
and enclose the highly variable Heschl's gyrus. The detection is performed using a 
deformable model of HG that is fitted to the cortical surface (i.e., a surface mesh that 
represents the grey- white matter interface, gwl) [36,40]). 

Input data for the deformable object detection task are 2-dimensional flat maps of 
the gwl (details about cortex reconstruction and flattening can be found in [41]). The 
function value at each location in the flat map is the curvature of the folded cortex 
before flattening (see Figure 8). Hence, change from outward folds (gyri) to inward 
folds (sulci) is given by zero crossings in the flat map. 

Identifying HG in a given flat map requires finding a specific U-shaped border at 
those zero crossings. This is difficult since the size, location, orientation and individual 
shape of HG vary substantially between subjects, while there are several gyri of similar 
shape in each cortical flat map. Experts identify the correct gyrus by means of its rela- 
tion with two anatomical landmarks: the Sylvian fissure (SF) and sulcus temporalis 
superior (STS) are approximately parallel to each other and orthogonal to HG. 

A part-based deformable model is well suited to represent anatomical descriptions 
such as U-shapedness and spatial relations between cortical gyri and sulci. 

HG, STS and SF were modeled on the morphological layer of a two-level model and 
combined on a structural layer that models the structural configurations of the macro- 
anatomical landmarks. Since the gyri and sulci were simple 2-D shapes, they were 
manually drawn based on example images. For HG, two different variants were created 
since some subjects may have a HG with a sulcus intermedius (SI). 

Boundary nodes of the HG and the HG+SI models had access to smooth potential 
fields, whose minima represent zero crossing locations of the flat map. In practice, the 
filtering operations were approximated as follows. For a given surface mesh we com- 
puted a discrete, difference-of-Gaussian filtered version of the curvature mapping: 
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We first applied an operator that separated convex regions (gyri) and concave regions 
(sulci), and then we used a discrete approximation of a heat diffusion kernel to smooth 
the resulting binary map. The low pass filter kernel widths of <7i = 2 mm and <7 2 = 2<7i 
for the difference-of-Gaussian provided a good tradeoff between a large region of influ- 
ence and blurring of adjacent zero crossings. During the model fit, we estimated for 
each model node the vector to the nearest salient vertex on the cortical surface mesh 
(with a local maximum filter response and within a given sampling distance) and used 
the weighted radial component of this vector as external force. 

Internal nodes of the HG model responded to positive curvature (gyri) only, whereas 
the HG+SI model contained nodes responding to negative curvature (sulci) at the SI 
location as well. Again, the potential fields were in practice defined based on heat ker- 
nel smoothed versions of the binary curvature maps. 

The shape and pose of HG, the exact relations between HG and the landmarks and 
also the shape of the landmarks SF and STS vary dramatically between subjects: 

- Since SF and STS primarily served as limits for restricting the cortex region to be 
searched for HG, it was important to robustly and correctly localize these land- 
marks, while only roughly matching their main branches. The STS and SF models 
were constructed as simple line-shaped structures, undersampled with FE nodes 
that responded only to appearance, i.e. curvature information. This sparse sampling 
allowed bridging gaps in the curvature maps, while sufficient similarity of the struc- 
ture to a line was still ensured, (example images showing variability of SF, STS) 

- The model at the top-layer arranged HG (and HG+SI, resp.) as the central part of the 
AC folding pattern nearly orthogonal to the two surrounding parallel sulci. To set up a 
sparse top layer model, "link" nodes were identified for related sub-shapes and dupli- 
cated such that the resulting structural model consisted of these shared link nodes 
(see Figure 9). Four nodes represented the parallel arrangement of SF and STS in the 
2-D flat maps, and the internal node was linked with the HG model to position it 
"above STS" and "below SF". In the structural prior model HG+SI an additional node 
was linked with a simple SI model in a "contained in HG"-relation. 

- The FE nodes are then subject to boundary conditions, such that any nodal dis- 
placement in the morphological coordinate frames (e.g., due to image forces) 
enforced displacements in the global model coordinate frame, while the resulting 
nodal displacements of the top-layer model acted as across-level spring forces on 
the link nodes of the morphological shape models. 




Figure 9 Generation of the hierarchical model from an example image. Models of Heschl's gyrus (in 

this case without a sulcus intermedius - SI), sylvian fissure (SF) and sulcus temporalis superior (STS) are 

connected via virtual links to the sparse structural model depicted on the right. 
* ) 
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- The search for the independent optimum pose parameters (of the structural 
model as well as HG and HG+SI, resp.) was based on a quasi stochastic sampling 
of an a priori constrained search space. To define the constraints, we used the 
Talairach-transformed flat map coordinate system [40,42] and asked an expert to 
annotate an example and specify possible variations in the location, size and orien- 
tation of the anatomical structures. This information was encoded in terms of pose 
parameter distribution functions. 

- Young's modulus was 2.0, Poisson ratio was 0.4 and material density was set to 
1.0 for all models. 

At each iteration of the deformable model search, a population of 100 model 
instances was initialized with random affine parameterizations and fitted to the data. 
Randomness was introduced into this evolutionary process by employing a rank selec- 
tion of fitted instances. Reproductive success varied with the relative "fitness", i.e. qual- 
ity, of instances. Poor fitting results were deleted, and Gaussian noise was added to the 
pose parameters of the best fitting instances to simulate the "mutation" in the repro- 
duction step and initialize new deformable model instances. This evolution process 
ends if no improvement in overall fitting quality was observed (typically after 3 steps). 

Success of the deformable model search for HG was measured by determining the 
percentage of cases were the correct gyrus was identified by one or more model 
instances among the 2% of best rated candidate solutions. The method was applied to 
flat maps from 80 subjects. A detailed discussion of results can be taken from [36,40]. 
In summary, we could show that 

- Using just the HG model without guiding structures and constraints on possible 
poses resulted in a 5% success. By constraining the search space, the correct gyrus 
was identified in about 50% of all cases. 

- Using the HG model together with SF and STS resulted in 80% success which 
can be improved to more than 90% if model parameters have been trained. 

- This part-based model could be directly used (1) to compute precise segmenta- 
tions of HG with less than 3 mm error compared with manual segmentations of 
HG and (2) to classify the given cortical surfaces correctly with respect to the pre- 
sence of a sulcus intermedius. 

- Using a single-layer model that comprises HG, STS and SF instead of the multi- 
ple layer model led to a significant decrease in result quality which did not improve 
by training. This means that the relevant anatomical information was better cap- 
tured by the structural decomposition and deformation parameters. 

- The method is very robust to changes in the parameterization. 

This example shows that domain knowledge, such as the structural arrangement of 
landmark structures that predict the location of a highly variable object of interest 
within data that carry poor semantics, can be directly formulated in the form of effi- 
cient constraints of a hierarchical, deformable model. This is very interesting in that 
one could expect that such a model provides a better symbolic representation of the 
"true" object anatomy and anatomical variability than a model that is learnt from 
annotated training data. Moreover, the expert knowledge can be more efficiently 
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improved by training (e.g., of correct poses) and expressed with less effort (e.g., by 
annotating a single example and "painting" connecting relations such as "parallel to" 
and "contained in" between the different structures, by accepting good solutions, apply- 
ing interactive forces during the model fit or by correcting the deformed shape of 
poorly fitted model instances). 

Segmenting the Substantia Nigra in transcranial sonography 

Transcranial sonography (TCS) produces ultrasound images of the brain that are 
acquired by imaging through the temporal bone window. The mesencephalic brain 
stem, or midbrain, containing the substantia nigra (SN) is visible in TCS images, 
although image quality compared to regular ultrasound images is poor. 

The echogenicity of the SN is a relevant feature in the diagnosis of Parkinson's dis- 
ease [43]. Computer-guided delineation of SN in TCS images has been the goal of the 
work presented in [39]. We observed that the butterfly-shaped midbrain section 
imaged by TCS is fairly invariant across patients as is the SN location relative to the 
midbrain. Since image artifacts inhibit a direct segmentation of the SN on the basis of 
low level image features, such as intensity or gradients, we used an elastically deform- 
able model for the detection of the midbrain. This is again a part-based model, but 
here we used a hierarchy of morphologies in a sequential fit (see Figure 10). The first 
layer consisted of two SN regions that were attached to a midbrain shape model on 
the second layer. The shape model is then used to constrain the final segementation 
(instead of using the shape itself for segmentation such as in [44]). 

Since artifacts and noise severely distort the images, it was not possible to compute 
smooth and reliable potential functions by linear filtering. To set up external image 
forces, we used a regional classifier over image regions that are large enough to coun- 
teract this effect and provide object boundary information. Three different node poten- 
tials were defined to account for the non-deterministic image signal: 

- Appearance nodes of all parts have a similar potential function: SN nodes reacted 
on high echodensity, since the SN typically produces more reflections than sur- 
rounding midbrain tissue. The internal nodes of the SN models sample a Gaussian 
low pass filtered version I* = G a * I of the image / (where a = 2.0), and the inten- 
sity forces are f = kNI*, with k > 0. For the echopoor midbrain, we expect low 
intensities in the interior and let k < 0. 




Figure 10 TCS of the midbrain (left) and derived model (right) The model T(2) on the upper layer 
represents the midbrain morphology which has been attached to substantia nigra (SN) models T1(1) and 
T2(2). After fitting model instances to the data, the boundary nodes of the fitted Ti(2) model instances are 
selected and then subsampled as contour models CI and C2. These are finally used as Active Contour 
Model for segmentation (Figure taken from [39]). 
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- The boundary FE nodes did not rely on intensity gradients since these were extre- 
mely unreliable. Instead, we computed at each iteration of the model fit a balloon- 
force f b = n b n in the model instances' current contour normal direction that pulls 
the nodes towards more robustly estimated inter-tissue boundaries. The magnitude 
and sign K b of the balloon force was defined using regional texture information. 
We dynamically computed an optimal discriminant between "object" and "back- 
ground" based on statistics over image intensity samples taken from the inside of 
the model and from the background. The sampling regions were defined in the 
image in inward and outward normal directions at the boundary nodes of the 
model instance. 

- Young's modulus was 0.9, Poisson ratio was 0.25 and material density was set to 
0.9 for all models 

As in the previous section, a deformable shape search computed the best fitting shape by 
simultaneous optimization of multiple two-layer model instances. In this application, how- 
ever, the search was performed sequentially on the global (midbrain localization) and local 
context (detection and segmentation of the SN). After finding the best fitting instance of 
the midbrain model, multiple instances with different parameterizations of the SN models 
were aligned to it and matched to the data to detect the hyperintense SN regions. This 
sequential process was necessary because the SN appears as a stripe-like structure on both 
branches of midbrain, but regularly exhibits the same echotexture as the adjacent brain 
tissue. That is, the midbrain serves as "guiding structure" for the contained SN, but for a 
given TCS image it is neither known in advance whether the SN is clearly visible, nor how 
much an echodense pattern of the SN extends. The fit of SN models should not influence 
the deformation of the midbrain model instance. 

Successful detection means that the true boundary of the SN will be in the vicinity of the 
boundary estimate by the SN sub-shapes. A final segmentation should make use of this 
information (e.g., shape-driven level sets [15] or graph cuts [12]). In the last step of our 
algorithm, the boundaries of the two deformed SN templates were taken as initial place- 
ment of Active Contour Models, which are then locally deformed to precisely adapt to the 
SN boundaries in the TCS image. The template contours were re-sampled to increase the 
sampling density and flexibility of the contour models. Internal forces and external balloon 
forces were set up as described in [45] and above (see [39] for details). 

The model was applied to 10 data sets, for which expert segmentations were avail- 
able, and it was found that in all cases the echogenic patterns of the midbrain and SN 
were correctly localized (Figure 11 shows an example). Small values of 1.03 + 0.44 mm 
boundary error at a pixel size of 0.1482 mm showed that the region boundaries were 
also outlined with high precision. This example application shows how to combine a 
model search with a constrained segmentation without training based on qualitative a 
priori knowledge about the appearance of the object. It also shows that the model is 
able to overcome problems from serious distortions in the image by replacing gradient 
information by a more elaborate boundary model. 

Detection of human vertebrae in MRI 

Investigation of normal variation of the anatomy of the spine and its vertebrae is one 
of the research questions within SHIP (Study of Health in Pommerania, [46]). We use 
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(a) (b) (c) (d) 



Figure 11 Example for the fitting process, (a) Best fitting midbrain model instance (b) induced 
initialization of the SN model instances (c) deformed SN instances (d) result after active contour 
segmentation based on (c). (Figure taken from [39]). 

MRI from SHIP to detect the course of vertebrae of the lower back. More than 40 dif- 
ferent MRI image sequences have been acquired within SHIP from several thousands 
of subjects in Pommerania. Two of the sequences - a Tj-weighted and a T 2 -weighted 
sequence - showing spine and vertebrae were used for the detection task using our 
model-based approach. Although vertebrae detection and segmentation focuses on 
radiographs and CT images, MRI-based analysis has been the subject of research in 
medical image analysis as well [47]. The domain knowledge in existing methods is 
often represented by a specific combination of processing modules where model infor- 
mation is inserted at several stages. Our goal was to investigate whether this can be 
replaced by our deformable part model. The advantage would be that adapting the 
detection to some other application would solely relate to model generation without 
having to change modules of the search process. 

Global optimization was not used since initialization is simple for the given data. The 
user places the model instance in a sagittal view on the middle slice of the image sequence. 
Optimization then takes place by model deformation based on local image attributes. 

The model was constructed according to the appearance of vertebrae and spine in a 
sample image sequence. Main attractor is the spine which is clearly visible in all images. 
Hence, the model consisted of a two-level hierarchy where vertebrae sub-shapes were con- 
nected with a spine sub-shape by a structural model on the second level (see Figure 12). 
Vertebrae sub-shapes were constructed all equal since no substantial variation is expected 
between different vertebrae of the lower back. The spine model supported proper localiza- 
tion of the vertebrae. Since its most discriminate aspect was the cylindrical shape, it was 
represented by a deformable, straight cylinder consisting of appearance nodes only. 
Appearance nodes represent the vertebra shape, since the relatively low and varying signal 
of the gradient allows only for small values of <7 in the smoothing function if boundary 
nodes were chosen. This strategy bears some similarity to the boundary model that was 
employed in the previous example for localizing the midbrain in TCS. 

For each of the two shape models, the vertebra and the spine, a weighted combina- 
tion of the Tx-weighted and the T 2 -weighted image was computed as appearance input 
(Figure 3 shows computation of the vertebra potential function). Weights for each of 
the two models were determined a priori and produced a clearly recognizable local 
minimum for vertebra and spine appearance, respectively. 

We used a single heterogeneous model, since the sub-shapes formed a common unit 
(the spine model) for which, e.g., vibration modes are computed and selected. The 
restriction that sub-shapes cannot rotate independently around a single connection 
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lower level: spine and vertebrae 




Figure 12 Two-level FEM for spine and vertebrae. The morphological sub-shapes are connected at 
nodes to a structural model that constrains transformation and deformation of sub-shapes with respect to 
each other. 



was not critical. It was even a desired attribute since it very well reflected anatomical 
relations between sub-shapes. Young's modulus was 1.0, Poisson ratio was 0.0 and 
material density was set to 0.1 for all models. 

Model generation from the sample shape was by Delauney triangulation from evenly 
distributed sample nodes. It produces a set of "well-shaped" tetrahedrons. Computing 
the rotation using node warping was solved by minimizing equation 9. Total computa- 
tion time until convergence on a standard quad-core CPU was between 1.1 and 2.6 
seconds per case with an average computation time of 1.5 seconds. 

We evaluated the method on 49 data sets from SHIP. Detection was declared successful 
if the center of each vertebra sub-shape was in the corresponding vertebra in the image 
data. Examples of results can be seen in Figure 13. Vertebrae were detected correctly in all 
but one case. In further 3 cases, minor mis-orientations happened. E = 1 provided an 
almost stationary behavior with respect to detection for parameter changes of more than 
10% for the other parameters. Further details on results can be taken from [38]. 

Since one of the analysis goals is to investigate whether variants exist in a normal 
population, we also clustered shape information of the spinal canal [48]. We currently 
also investigate clustering on the entire model using the weight vectors of the fitted 
model instances for analysis of the shape of the spinal cord. This is future work, of 
course, since detection is not yet followed by segmentation. However, first results are 
promising, since clustering weight vectors into five clusters using k-means clustering 
revealed four shape variants that seem to be anatomically meaningful (see Figure 14). 
Cluster variance, however, is relatively high compared to inter-cluster distances (see 
Table 1) 

Conclusion 

We presented a part-based deformable model that can be used for directly and effi- 
ciently representing domain knowledge necessary for detection of objects in medical 
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case 1: initialization convergence case 2: initialization convergence 

Figure 13 Examples for detection results (initialization and result after convergence) Presently, the 
model does not contain the typical curvature dose to the os sacrum. Detection was satisfactory but 
orientation of L5 was sometimes not correct when curvature was not supported from a strong signal at 
the spine (e.g., case 1). Including expected curvature of the spine might be necessary if the detection 
result is to be used as shape prior to segmentation. 




2 Os W 1 Ot 2 1 " 1 Ot "1 Os 

#: number of curvature extrema sagittal, frontal, s: straight, t: tilted 



Figure 14 First results from clustering the converged model It shows the ability to distinguish 
between shape classes. Improvement is needed, however, since insufficient fit in the lower part of the 
spine degrades the results. 



Table 1 Intra- and inter-cluster distances for shape clustering of the adapted spine 
model. 



Inter-cluster 
distance 


1 


2 


3 


4 


5 


1 


0 


73.2748 


1 69.9500 


123.3598 


81.9592 


2 


73.2748 


0 


1 50.4898 


83.5072 


88.7331 


3 


169.9500 


1 50.4898 


0 


71.1408 


90.5465 


4 


123.3598 


83.5072 


71.1408 


0 


66.5185 


5 


81.9592 


88.7331 


90.5465 


66.5185 


0 


Intra-cluster distance 


63.8863 


60.1210 


51.8204 


50.8078 


50.7387 



Inter-cluster distances are distances between the means of weights of vibration modes for each cluster. Intra-cluster 
distances are average standard deviations of all vibration mode weights. 
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images. Different examples demonstrate its application to context-based detection, 
model-based segmentation, and to shape analysis. 

Establishing and applying such model requires readily available methods for generat- 
ing a triangulated shape and appearance models from a segmentation as well as meth- 
ods for local and global optimization. Domain knowledge of our part-based deformable 
model resides almost completely in the model and not in the sequence of processes to 
fit the model to the data. The model does not require training, since the few para- 
meters (stiffness, decomposition into parts, node potentials, pose parameters) can be 
set a priori and adapted intuitively. However, experience about setting up the model 
will speed up the design process. A number of decisions have to be made: 

- Structural decomposition of the model depends on part-relationships that can be 
derived from knowledge about the anatomy or arrangement objects to be detected. 
In medical applications, parts are typically anatomic substructures or a group of 
neighboring landmark structures that are necessary to determine an object's 
position. 

- Elasticity parameters describe the variability between shape instances and - on the 
structural level - variability of spatial relations between sub-shapes. 

- External forces are set depending on the expected appearance of the object in the 
image. While gradient-based potential forces often allow for exact determination of 
the object boundary, they may be supplemented by intensity or texture-based 
forces if gradient information is unreliable. 

In future we will investigate the potential for parameter training of the model. Training 
of the few parameters in the data and the model terms should be much simpler requiring 
less training data than training of a PDM. An alternative to adapt parameters would be to 
take the detection result as an initialization for the search of an optimal potential field 
given the converged model instance instead of trying to improve the overall performance 
of the model. This may also enhance the potential of the model to guide data-driven seg- 
mentation schemes such as level set segmentation or graph cuts. It will also be worthwhile 
to take a second look at the analysis of shape parameters for the detection of shape classes 
using adapted model parameters. 
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